Crystal structure of a GCN5-related N-acetyltransferase from Lactobacillus curiae

The 3D structure of a GCN5-related N-acetyltransferase enzyme that is selective for canavanine has been elucidated and shown to share the fold and catalytic mechanism of the polyamine acetyltransferase subclass.


Introduction
Acetylation is a major post-translational modification that is found in all domains of life (Favrot et al., 2016). It was first described as a regulatory mechanism in the 1960s with the discovery of histone acetylation (Phillips, 1963) and the discovery of a bacterial aminoglycoside acetyltransferase, which was shown to confer antibiotic resistance (Okamoto & Suzuki, 1965). The importance of this ubiquitous modification has become progressively established in the past decades and it is now known to occur in multiple molecular targets, including proteins, polyamines, toxins, transfer RNA and cellwall components (Burckhardt & Escalante-Semerena, 2020). Accordingly, it has widespread involvement in many cellular processes. Acetylation is catalysed by acetyltransferases, which represent one of the largest known protein superfamilies, with more than 300 000 representatives (Salah Ud-Din et al., 2016). Acetyltransferases can be divided into three main classes: MYST[MOU1] (Pfam01853), p300/ CBP[MOU2] (Pfam06466) and GCN5-related N-acetyltransferases (GNATs; Pfam00583) (Burckhardt & Escalante-Semerena, 2020). Despite their sequence and structural diversity, all acetyltransferases function by transferring an acetyl group from the cosubstrate acetyl-coenzyme A (Ac-CoA) to the amino group of a specific substrate; the substrates can be very diverse across the enzymes, ranging from small metabolites such as amino acids to secondary metabolites such as antibiotics. In the GNAT class, the acetyl group can be transferred to either the "-amino group (N " ) of a lysine residue or the -amino group (N ) of an N-terminal residue (Favrot et al., 2016).
LcGNAT is the only canavanine-acylating GNAT identified to date. It was further found that canavanyl-tRNA Arg deacylase (CtdA) is also guanidine riboswitch-associated , and both LcGNAT and CtdA are found in the same biological habitats, specifically canavanine-rich habitats such as the legume rhizosphere or herbivore gut. Hence, it has been suggested that LcGNAT serves a similar biological purpose as CtdAs, namely prevention of the misincorporation of canavanine into the bacterial proteome, as acetylation makes canavanine unusable for protein synthesis by the ribosome.
Here, we resolved the 3D structure of the guanidine riboswitch-associated enzyme LcGNAT to gain an insight into the mechanistic diversity of this protein family and to facilitate future research towards revealing its catalytic mechanism. Specifically, gaining insight into the discrimination between the closely related canavanine and arginine substrates is a future goal.

Cloning
The full-length, codon-optimized gene for LcGNAT (NCBI WP_035166819.1) was commercially synthesized (Thermo-Fisher) and cloned into the expression vector pET-28a (EMBL vector collection) by restriction-site cloning and quick ligation (NEB). The vector adds an N-terminal His 6 tag and a Tobacco etch virus (TEV) protease-cleavage sequence N-terminal to the inserted gene of interest.
LcGNAT variants carrying a single amino-acid mutation were generated by whole-plasmid overhang PCR followed by quick ligation (NEB).

Protein production
For protein expression, expression plasmids were transformed into Escherichia coli BL21(DE3) strain Gold (Agilent) and a starter culture was grown overnight in Luria-Bertani medium supplemented with kanamycin (30 mg ml À1 ) at 37 C.
After a 1:500 dilution, the culture was further cultivated at 37 C and 200 rev min À1 to an OD 600 of approximately 0.5. Protein expression was then induced with 0.5 mM isopropyl -d-1-thiogalactopyranoside (IPTG) and the culture was further grown at 18 C for approximately 16 h. The cells were harvested by centrifugation and stored at À20 C. Subsequently, the cells were resuspended in 50 mM Tris-HCl pH 8.0, 100 mM NaCl, 20 mM imidazole containing protease-inhibitor cocktail (cOmplete Mini, EDTA-free, Merck) and lysed by sonication. The sample was purified by immobilized metalaffinity chromatography (IMAC) using Ni 2+ -NTA agarose (Qiagen) and was eluted with 50 mM Tris-HCl pH 8.0, 100 mM NaCl, 500 mM imidazole. This was followed by removal of the His 6 tag by the addition of TEV protease. The sample was then dialyzed against 50 mM Tris-HCl pH 8.0, 100 mM NaCl to remove excess imidazole and subtractive IMAC was performed using Ni 2+ -NTA agarose (Qiagen) to remove the protease, the cleaved tag and any LcGNAT that retained a tag. To perform size-exclusion chromatography, the sample (with an approximate volume of 3 ml) concentrated to 6 mg ml À1 was loaded onto a Superdex S75 16/60 column (GE Healthcare) pre-equilibrated in 50 mM Tris pH 8.0, 100 mM NaCl. The resulting protein sample was estimated to be >95% pure using SDS-PAGE stained with Coomassie Blue. Finally, the sample was concentrated to 15 mg ml À1 and stored at 4 C until further use. The approximate yield of purified LcGNAT was 8 mg per litre of E. coli culture. Protein concentrations were determined from A 280 values measured using a UV-Vis spectrophotometer (Eppendorf) by applying the Beer-Lambert law using a molar extinction coefficient (" = 23 950 M À1 cm À1 at 280 nm) calculated from the sequence data using ProtParam (Gasteiger et al., 2005).

X-ray crystallography
Crystals of LcGNAT grew from solutions consisting of 1 M potassium sodium tartrate, 0.1 M MES-NaOH pH 6.0 in Intelli-Plates (Art Robbins) using the sitting-drop vapourdiffusion method at 18 C. Crystallization drops consisted of a 1:1 ratio of protein solution and reservoir solution and had a volume ratio of 200:200 nl. For X-ray irradiation, crystals were cryoprotected with Paratone-N (Hampton Research) prior to flash-vitrification in liquid nitrogen.
X-ray diffraction data were collected on beamline PXI at the Swiss Light Source (SLS) synchrotron, Villigen, Switzerland under cryo-conditions (100 K). Data processing used XDS and XSCALE (Kabsch, 2010) MolProbity (Williams et al., 2018). Molecular images were rendered using UCSF Chimera (Pettersen et al., 2004). X-ray data statistics and model parameters are given in Table 1.

Results and discussion
3.1. LcGNAT has a conserved fold The atomic structure of LcGNAT has been elucidated to 1.95 Å resolution using X-ray crystallography (Fig. 1, Table 1). Representative electron density is shown in Supplementary  Fig. S1. The crystal form obtained in this study contains two molecular copies in the asymmetric unit, which are essentially identical (r.m.s.d. of 0.73 Å for all 174 C atoms). The exclusion volume of LcGNAT in size-exclusion chromatography suggested the presence of a single species with approximate molecular mass 23 kDa ( Supplementary Fig. S2). As the molecular mass of LcGNAT calculated from the sequence data is 20.25 kDa, this indicates that the enzyme is monomeric in solution under the experimental conditions used. An inspection of the protein-protein interface of the dimer in the crystallographic asymmetric unit, as well as potential dimers across crystallographic axes, using PISA assigned the lowest complex-formation significance score of 0 to the resulting molecular interfaces, suggesting that molecular interfaces in the crystal are the result of lattice packing only. Thus, we conclude that LcGNAT is likely to be monomeric in its biologically relevant state.
The LcGNAT fold approximates, but does not exactly follow, the canonical topology of the GNAT enzyme family: 0-1-1-2-2-3-4-3-5-4-6 (Favrot et al., 2016;Burckhardt & Escalante-Semerena, 2020). Instead, LcGNAT consists of seven -strands and four -helices with composition 1-1-2-2-3-4-3-5-4-6-7, therefore lacking an N-terminal 0 strand and including a C-terminal 7 strand ( Fig. 1a and Supplementary Fig. S3). This fold divergence has been observed in various other polyamine acetyltransferases (see below) and agrees with the known fact that the N-and C-termini of this fold contain the least conserved secondarystructure elements (Salah Ud-Din et al., 2016). As is also common to members of this enzyme family, LcGNAT contains an extensive tunnel which perforates the protein fold and is generated by a V-shaped splaying of -strands 4 and 5 (Fig. 1a). This V-shaped feature accommodates the pantothenate moiety of Ac-CoA in GNATs (Wybenga-Groot et al., 1999) and has also been reported to be involved in the  formation of the oxyanion hole that polarizes the thioester carbonyl reaction intermediate (Bhatnagar et al., 1998;Farazi et al., 2001). At one entrance to the tunnel is the pyrophosphate-binding loop (P-loop; Fig. 1a), which binds the pyrophosphate moiety of the Ac-CoA cosubstrate. The sequence of the P-loop is highly conserved in GNATs, hosting a consensus motif (R/Q-X-X-G-X-A/G; Favrot et al., 2016;Burckhardt & Escalante-Semerena, 2020). Interestingly, the P-loop of LcGNAT is degenerated, as is also often the case in polyamine acetyltransferases (Fig. 1b,   The structure and sequence of LcGNAT is conserved among polyamine acetyltransferases. (a) Crystal structure of LcGNAT. Catalytic residues that are subjected to mutagenesis in this study are displayed. The P-loop is also highlighted. (b) Sequence alignment of LcGNAT and its four closest homologs with known 3D structures. The sequences correspond to Bacillus subtilis PaiA (BsPaiA), Streptococcus mutans putative acetyltransferase (SmGNAT), Thermoplasma acidophilum PaiA (TaPaiA) and T. volcanium N-acetyltransferase (TvArd1). PDB codes are given in parentheses. Residue numbering uses the LcGNAT sequence. Secondary structure is colour-coded as in (a), where -strands are shown in green and -helices are in yellow. Heavy outlined -strands indicate the two strands involved in the formation of the V-shaped splay. Sequence identity is shown in yellow. Ac-CoA-binding residues were defined by PDB sequence annotations and are shown in green. The P-loop is boxed. Black dots indicate residues that are part of the substrate tunnel. Red stars highlight residues that were mutated in the acetylation activity assay.
Ac-CoA-dependent, it can be inferred that the differences in the sequence in the P-loop do not result in a significant functional difference in these enzymes.

The functional groups of LcGNAT
Efforts to elucidate the structure of LcGNAT in complex with Ac-CoA and/or the canavanine substrate in this study were not successful. Thus, we aimed to identify the active-site residues by comparison with characterized homologues of known three-dimensional structure bound to the Ac-CoA cosubstrate. For this, we performed a homology search of the Protein Data Bank. Four close homologues were identified in this way. The closest structural homologue to LcGNAT was a member of the spermidine/spermine-N 1 -acetyltransferase (SSAT) family from the Gram-positive bacterium Bacillus subtilis (BsPaiA; PDB entry 1tiq; Forouhar et al., 2005). Despite its different target substrate, this enzyme shares 46.6% sequence identity and 66.1% sequence similarity with LcGNAT. The next closest homologues were GNAT from Streptococcus mutants (SmGNAT; PDB entry 4e2a) and PaiA from Termoplasma acidophilum (TaPaiA; PDB entry 3k9u; Filippova et al., 2011), with sequence identities of 37.0% and 21.7% and similarities of 61.3% and 46.9%, respectively, and the more distantly related Ard1 from T. volcanium (TvArd1; PDB entry 4pv6; Ma et al., 2014), with a sequence identity of 24.3% and a similarity of 39.1% (Fig. 1b). As expected from the sequence similarity, LcGNAT and the four identified homologues share the same, closely superimposable fold (Fig. 2a). Structural variability affects -hairpin 6-7, and also helix 5 in SmGNAT, which precedes this -hairpin. Thus, we concluded that the 5-6-7 region is the most flexible and dynamic in this modified version of the GNAT fold.
The structures of BsPaiA and TvArd1 were elucidated in complex with CoA and those of TaPaiA and TvArd1 were elucidated in complex with Ac-CoA. In all cases, the cosubstrate binds within the tunnel at the centre of the fold. However, despite the close similarity of the shared fold, the binding modes of CoA/Ac-CoA across the different enzymes are remarkably divergent (Fig. 2b). The binding of the pyrophosphate moiety by the P-loop of the enzyme is the bestshared characteristic across the available liganded structures, with little conservation of cosubstrate conformation existing outside this loop (Fig. 2b). This conformational diversity does not make it possible to reliably model the complexation of Ac-CoA by LcGNAT. However, given the high sequence similarity of LcGNAT and BsPaiA and the close structural overlap of their structures (r.m.s.d. of 1.27 Å on all 170 C atoms; Fig. 2a), we could confirm that the binding mode of Ac-CoA observed in BsPaiA is compatible with LcGNAT (Fig. 2c). The Ac-CoA thus modelled in LcGNAT was oriented in such a way that the P-loop accommodated the pyrophosphate moiety well and the acetyl group pointed into the tunnel. Unfortunately, efforts to identify the binding site for canavanine in LcGNAT in order to determine how this enzyme achieves its selectivity were unsuccessful. We tried to examine the non-AcCoA side of the tunnel in order to identify the binding interface for canavanine. In particular, we examined residues within a 5 Å distance of the lysine substrate in the recent crystal structure of the distant homologue moss spermine/ spermidine acetyltransferase (PpSSAT; PDB entry 7zkt; Bě líček et al., 2023; Fig. 1b). Remarkably, the residues thus identified are largely conserved across polyamine acetyltransferases and do not explain the selectivity of LcGNAT for canavanine. This suggests that other residues that are not identifiable at present must also mediate the binding of the larger canavanine substrate in LcGNAT.

Catalytic residues
An overall sequence alignment of LcGNAT with the four identified homologues described above showed that the enzymes share high conservation in the regions involved in Ac-CoA binding and in residues that are thought to have a catalytic impact (Fig. 1b). In BsPaiA, the side chain of a conserved tyrosine residue (Tyr142) was shown to interact with the S atom of CoA and has been proposed to serve as a general acid in catalysis (Forouhar et al., 2005). However, an alternative study of Enterococcus faecium GNAT proposed that Tyr147 (equivalent to Tyr142) is not involved in the chemical catalysis of the reaction but instead dictates the optimal orientation of the acetyl group for transfer (Draker & Wright, 2004). Even though its exact mechanistic role is unclear, the review by Salah Ud-Din et al. (2016) reported that an equivalently positioned tyrosine residue is crucial for catalysis in nearly all GNAT enzymes described to date. In addition, in human SSAT the conserved residue Glu92 was proposed to serve as a general base that performs a watermediated proton extraction from the substrate (Hegde et al., 2007). Mutagenesis also confirmed a role of this residue in catalysis in other GNATs (reviewed by Salah Ud-Din et al., 2016). Both Glu92 and Tyr142 are also present in LcGNAT. Using site-directed mutagenesis, we generated the LcGNAT variants E92Q and Y142F and tested their enzymatic activity, confirming that both residues also impair catalysis in LcGNAT and therefore are catalytically relevant (Fig. 3). Interestingly, LcGNAT and BsPaiA share an additional tyrosine (Tyr97) in the Ac-CoA binding site that was annotated to mediate cosubstrate binding in BsPaiA (Forouhar et al., 2005). Both Tyr97 and Tyr142 are at a similar distance from the carbonyl moiety of the acetyl group in the crystal structure of BsPaiA. A similar residue, Tyr93, has also been proposed to be involved in cosubstrate positioning in TaPaiA (Filippova et al., 2011). However, a tyrosine residue is not conserved in this position across all GNATs (Fig. 1b). Here, we mutated Tyr97 in LcGNAT to the similarly sized, but catalytically inert, phenylalanine. The Y97F LcGNAT variant also showed a significantly decreased catalytic activity (Fig. 3). These mutational results strongly suggest that LcGNAT shares its catalytic mechanism with other GNATs, although it remains unclear which tyrosine (Tyr97 or Tyr142) functions as a general acid in the case of LcGNAT and BsPaiA. Speculatively, in GNATs where both tyrosine residues are present one tyrosine could act as an acid while the other might position the acetyl group.

Data availability
Model coordinates and diffraction data have been deposited with the Protein Data Bank under accession code 8osp. X-ray diffraction images have been deposited at https://doi.org/ 10.5281/zenodo.7848164.